Path Enumeration
Explore the first solution of the Naive Trees Antipattern.
We'll cover the following
One weakness of an Adjacency List is that it’s expensive to retrieve the ancestors of a given node in the tree. In Path Enumeration, this is solved by storing the string of ancestors as an attribute of each node.
Path Enumeration in directory hierarchies#
We can see a form of Path Enumeration in directory hierarchies. A UNIX path like /usr/local/lib/
is a Path Enumeration of the filesystem, where usr
is the parent of local
, which in turn is the parent of lib
.
Creating Comments
table with a path
column#
To employ the Path Enumeration solution, we can define a column called path
as a long VARCHAR
, instead of the parent_id
column, in the Comments
table. The string stored in this column is the sequence of ancestors of the current row, from the top of the tree down, just like a UNIX path. We can even choose / as a separator character.
Querying the Comments
table#
This section will experiment on the table with some queries to find out the ancestors and the descendants.
Finding out ancestors of a comment#
We can query ancestors by comparing the current row’s path to a pattern formed from the path of another row. For example, we will use the following query to find the ancestors of comment 7
, whose path is 1/4/6/7/
.
This matches the patterns formed from paths of ancestors 1/4/6/%
, 1/4/%
, and 1/%
.
Finding out the descendants of a comment#
We can query descendants by reversing the arguments of the LIKE
predicate. To find the descendants of comment 4
whose path is 1/4/
, we can use the following query:
The pattern 1/4/%
matches the paths of descendants 1/4/5/
, and 1/4/6/
, and 1/4/6/7/
.
Counting the number of comments#
Once we can select a subset of the tree or the chain of ancestors to the top of the tree, we can perform many other queries easily, such as computing the SUM()
of costs of nodes in a subtree or simply counting the number of nodes. For example, to count the comments per author in the subtree starting at comment 4
, we can do this:
Test it yourself#
Try to run the queries in the following playground:
/
- main.sql
You can copy and paste the queries in the playground to see the effect on the database.
How manipulation in a tree affects the model#
The process of inserting a node is similar to that of inserting in the Adjacency List model. We can insert a non-leaf node without needing to modify any other row.
To do this, let’s copy the path
from the new node’s parent, and append the ID of the new node to this string. If our primary key generates its value automatically during the insert, we may need to insert the row and then update the path
once we know the ID value for the new row. For example, if we use MySQL, the built-in function LAST_INSERT_ID()
returns the most recent ID value generated for an inserted row in the current session. We can get the rest of the path from the parent of our new node.
Let’s retrieve the data in the Comments
table after inserting a record.
Path Enumeration oddity#
Path Enumeration has some drawbacks similar to those shown in the chapter Jaywalking. The database can’t enforce that the path is formed correctly or that values in the path correspond to existing nodes. Maintaining the path string depends on the application code, and verifying it is costly.
Moreover, no matter how long we make the VARCHAR
column, it still has a length limit, so it doesn’t strictly support trees of unlimited depth.
Path Enumeration allows us to sort a set of rows easily by their hierarchy, as long as the elements between the separator are of a consistent length.